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Abstract 

Human respiratory syncytial virus (HRSV) lias three surface glycoproteins: small hydrophobic (SH), attachment (G) and fusion 
(F), encoded by three consecutive genes (SH-G-F). A 270-nt fragment of the G gene is used to genotype HRSV isolates. This 
study genotyped and investigated the variability of the gene and amino acid sequences of the three surface proteins of 
HRSV strains collected from 1987 to 2005 from one center. Sixty original clinical isolates and 5 prototype strains were 
analyzed. Sequences containing SH, F and G genes were generated, and multiple alignments and phylogenetic trees were 
analyzed. Genetic variability by protein domains comparing virus genotypes was assessed. Complete sequences of the SH- 
G-F genes were obtained for all 65 samples: HRSV-A = 35; HRSV-B = 30. In group A strains, genotypes GA5 and GA2 were 
predominant. For HRSV-B strains, the genotype GB4 was predominant from 1992 to 1994 and only genotype BA viruses 
were detected in 2004-2005. Different genetic variability at nucleotide level was detected between the genes, with G gene 
being the most variable and the highest variability detected in the 270-nt G fragment that is frequently used to genotype 
the virus. High variability (>10%) was also detected in the signal peptide and transmembrane domains of the F gene of 
HRSV A strains. Variability among the HRSV strains resulting in non-synonymous changes was detected in hypervariable 
domains of G protein, the signal peptide of the F protein, a not previously defined domain in the F protein, and the 
antigenic site 0 in the pre-fusion F. Divergent trends were observed between HRSV -A and -B groups for some functional 
domains. A diverse population of HRSV -A and -B genotypes circulated in Houston during an 18 year period. We hypothesize 
that diverse sequence variation of the surface protein genes provide HRSV strains a survival advantage in a partially 
immune-protected community. 
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Introduction 

Human respiratory syncytial virus (HRSV) is the major cause of 
lower respiratory disease among infants [1,2], a frequent pathogen 
in elderly and immunosuppressed patients [3] and a major public 
health concern worldwide [4]. Although an immunoprophylactic 
approach [5] has been approved for infants with high-risk medical 
conditions (palivizumab or Synagis®) [6], no safe and effective 
vaccine has been licensed to date for use in humans. 

As a member of the family Paramyxoviridae, HRSV is a 
nonsegmented negative-strand RNA virus [7]. The viral genome 
of ~ 15,200 bases contains 10 genes that are transcribed to 11 
proteins in a 3' to 5' sequential order [7]. The small hydrophobic 
(SH), the attachment (G) and the fusion (F) genes code the three 
surface proteins of the virus. The variations on monoclonal 
antibody binding patterns, driven by the antigenic diversity on the 
G glycoprotein [8] have led to the identification of two antigenic 
groups: HRSV-A and HRSV-B [9] . The G glycoprotein is a type 
11 glycoprotein that contains two hypervariable regions flanking a 
non-glycosylated central conserved domain. The analysis of the 



nucleotide sequence on the second hypervariable region of the G 
gene has led to classification of HRSV genotypes within HRSV -A 
and -B groups [10] and revealed that multiple genotypes can co- 
circulate during the same epidemic season [10,11]. Since then, 
many studies have used this approach to report the molecular 
epidemiology and genetic variability of HRSV worldwide [1 1-22]. 
For HRSV-A, at least ten genotypes (GA1-GA7 [10,11], SAAl 
[22], NAl and NA2 [21]) have been described, with GA2 and 
GA5 predominating in many countries in recent years 
[13,16,18,19]. Interestingly, a recent novel HRSV-A genotype 
with a 72-nucleotide G gene duphcation was reported in Canada 
[23] (called genotype ONI). In the case of HRSV-B strains, 
phylogenetic analyses have identified at least 13 genotypes (GBl— 
GB4 [10], SAB1-SAB3 [12] and BA1-BA6 [17]). Similar to the 
ONI genotype, a 60 nt-duplication in the G gene of HRSV-B was 
identified in 1998 in Buenos Aires. This novel HRSV-B subgroup 
was named BA genotype [24], and spread worldwide within a 7 
years period, replacing the other prevailing HRSV-B genotypes by 
2005. It is remarkable that neither ONI nor BA genotypes have 
been associated with a different virulence profde, even though 
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major changes occurred in the distal fragment of the G gene. 
Moreover, no consistent association has been established between 
severity of disease and HRSV genotype. 

Recently, new evidence ha.s shown that the three surface 
proteins of HRSV, - G, F and SH - have relevant roles in 
pathogenesis and the host immune response. The G protein is 
involved in the viral attachment [25] through heparan suUate-like 
moieties (proteoglycans) on the apical surface of epithelial cells 
[26]. The extensive glycosylation in the hypervariable regions of 
the ectodomain has shown to be relevant for virus infectivity [27], 
probably in relation with its complex secondary structure [8]. 
Nevertheless, no specific pathogenic role has been attributed to 
this "mucin-like" conformation. A CX3C motif has been 
described in the central conserved domain of the G protein, 
similar to the CX3C domain of the chemokine fractaUdne. This 
motif has been shown to mimic the leukocyte chemotactic activity 
of fractalkine in vitro [28] and to inhibit the activation of the NF-kB 
and the secretion of inflammatory cytokines by human monocytes, 
suggesting an inhibitory role on host innate immune response to 
HRSV [29]. The secreted form of G protein (soluble G), generated 
by the initiation of translation at a second AUG codon of the open 
reading frame [30], has been shown to inhibit Toll-like Receptor 
(TLR) 3/4 mediated IFN-beta in vitro [31], suggesting that it may 
also play a role in suppressing the innate immune response. 

The F protein, which mediates the fusion and later formation of 
syncytia, is assembled into trimers in the membrane surface and is 
modified by the addition of N-linked carbohydrates. A conserved 
hydrophobic fusion peptide region has been described, followed by 
two heptad repeat domains that are separated by a cysteine-rich 
region [32] shown to be important for fusion activity [33]. A 
cytoplasmatic tail in its C-terminal end is relevant for virion 
assembly [34], by the recruitment of viral proteins into filaments in 
the cell surface. The fusion protein plays a major role in viral 
attachment, interacting with the recently described HRSV 
receptor: nucleolin [35]. The F protein induces the innate immune 
response, by interacting with TLR-4 on human leukocytes [36] 
and possibly epithelial cells, and promoting p53-dependent 
apoptosis [37]. 

The SH, it is a highly cxmserved small surface protein [38,39], 
with a hydrophobic core possibly corresponding to a single 
transmembrane domain [40]. Molecular modehng studies have 
suggested that pentamers or hexamers [41] are formed with a 
circular structure and a central pore [42] . Although it is not clear 
its role in HRSV life-cycle, it has been shown that deletion of the 
protein results in attenuation of replication in animal models 
[43,44]. It has been described as a ion channel [40] mediating 
membrane permeability [41], and more interesting, it has been 
implicated in inhibition of apoptosis by the TNF-a pathway 
[45,46]. However, the genetic variability of the genes encoding the 
three surface proteins of RSV obser\'ed through successive 
epidemics and their phylogenetic topologies have not been studied. 

In this manuscript we describe the genetic and amino acid 
variation in the three major surface proteins of RSV from 60 
cUnical isolates collected from 1987 to 2005 from one medical 
center. We compare the major contemporary genotypes to the 
prototype genotypes detected in the late 1950s and early 1960s 
and provide genetic information on the domains that drive the 
variation on the SH, G and F surface proteins. 

Materials and Methods 

Ethics Statement 

HRSV isolates were selected from the biorepository securely 
stored in a limited access laboratory of the senior investigator 



(PAP) at the Department of Molecular Virology and Microbiology 
of Baylor College of Medicine. All samples stored in the 
biorepositor)' are assigned a unique laboratory number and 
barcode, and are linked to a secured database with limited access. 
All nasopharyngeal aspirates, nasal wash samples, and throat swab 
samples were collected from consented participants who had 
participated in IRB approved studies at Baylor College of 
Medicine. Future use authorization at the time of consent was 
required for storing virus positive samples in the biorepository. 
Many of the RSV isolates reported in this study were collected 
before future use authorization was mandated. For this study no 
additional IRB approval was obtained for sequencing coded RSV 
isolates. Chnical data from the RSV infected subjects were not 
provided other than the year and city the virus was isolated. 

Virus strains 

The study included sixty original clinical isolates of HRSV 
collected from children with lower respiratory symptoms in 
Houston, Texas from 1987 to 2005. The samples were selected 
from the biorepository at the Department of Molecular Virology 
and Microbiology of Baylor College of Medicine where they were 
stored at — 8()°C. Five prototype strains used in our laboratory for 
research were also included: RSV-A-USA-Long-56, RSV-A-TX- 
Tracy-Oct87, RSV-A-Bernett-61, RSV-A2-AUS-6 1 and RSV-B- 
CH-18537-62. The number at the end of the nomenclature 
represent the year of isolation in the twentieth century. The 
original clinical isolates collected in Texas were isolated on HEp-2 
cells culture tubes. Viral cultures with cytopathic effect (CPE) were 
passed a second time in a 24-well plate plaque assay. A third 
passage was performed in a 25 cm^ flask with HEp-2 monolayer. 
The flask was harvested at days 3 or 4 post-inoculation when the 
monolayer demonstrated approximately 75% CPE. Infected ceUs 
were lysed using sterile glass beads; the supernatant was sonicated 
and then clarified by low speed centrifugation. The clarified 
supernatant was mixed with erjual volume of \b% glycerol in 
Iscoves DMEM for stabilizing the virus during freeze/thaw 
conditions. Aliquots were made, snapped-frozen, and stored at 
— 70°C for future analysis. 

Primer design 

The strategy used to amplify the region of the viral genome that 
includes the SH, G and F genes is illustrated in Figure 1. A 
previously described approach by Kim et al [47] was used to 
obtain the Fl region of the F gene. The rest of the primers (IDT, 
Iowa, USA) were designed based on the available sequence data 
from Genbank: U50362 (RSV A2) and NC_001781 (RSV Bl). 
Briefly, the segment of interest on the RSV genome was amplified 
by PGR using five overlapping fragments of 547 to 1262 bp 
length, as shown in Table 1. For the fragments Fl and F2, we used 
the same set of primers to amplify HRSV-A and HRSV-B strains 
because these regions are highly conserved in both RSV groups 
(Fl-AB-FVVD/Fl-AB-REV and F2-AB-RVD/F2-AB-RE\'0. For 
the other regions, we used different sets of primers to amplify 
RSV-A and RSV-B viruses. 

RNA extraction and Reverse Transcription PCR 

Viral RNA was extracted from the supernatant of the third 
HEp-2 cell passaged RSV clinical isolates using the Mini Viral 
RNA Kit (Qiagen Sciences, Maryland, USA) with the automated 
platform QIAcube (Qiagen, HUden, Germany) according to the 
manufacturer instructions. cDNA was generated by reverse 
transcriptase reaction using random hexanucleotide primers 
(New England Bio Labs, Ipswich, MA, USA) and avian 
myeloblastosis virus (AMV) reverse transcriptase (Life Sciences, 



PLOS ONE I www.plosone.org 



2 



March 2014 | Volume 9 | Issue 3 | e90786 



Genetic Variability of Respiratory Syncitial Virus 




~ 3,500 bp 

Figure 1. Sequencing Strategy. The region of the HRSV genome (SH-G-F) was amplified by PCR using five overlapping fragments, ranging from 
547 to 1262 bp in length. Sequencing was performed in forward and reverse directions and final consensus was obtained by contig assembly for 
each sample. 

doi:1 0.1 371 /journal.pone.0090786.g001 



St. Petersburg, FL, USA) at 43°C for 60 minutes in a 2720 
Thermal Cycler (Life Technologies, Applied Biosystems, Carlsbad, 
CA). Amplification of each gene fragment was performed using 
Platinum ® Taq Polymerase (Invitrogen) in a 2720 Thermal 
Cycler. Primers and conditions of PCR reactions are shown in 
Table 1. PCR products were visualized by 1% agarose gel 
electrophoresis. 



DNA sequencing and contig construction 

PCR products were purified using ExoSAP-IT reagent 
(Affymetrix, Inc) as per manufacturer instructions and sent for 
sequencing by Sanger method at Genewiz (South Plarnfield, NJ). 
The sequencing was performed in forward and reverse directions 
for each gene fragment and the quality score of chromatogram 
traces was confirmed by visualizing them with BioEdit Sequence 



Table 1. Primers and specific conditions used for PCR amplification of RSV surface proteins genome. 







DC\/ ^vmm 

naV uroup 


Primers 


rt-K conuitions 




Fragment Length (bp) 


SH 


A 


SH-A-FWD: CCC AGA TCA TCC 
CAA GTC AT 


32 cycles: 




547 






SH-AB-REV: TGT TTG GAC ATG 
GTT GCA TT 


{94°Cx30"-59=Cx30"- 


72°Cxr) 






B 


SH-B-FWD: CAC AAA CCA ATC 
CCA CTC AA 


32 cycles: 




571 






SH-AB-REV: TGT TTG GAC ATG 
GTT GCA TT 


(94°Cx30"-59°Cx30"- 


72°Cxl') 




G1 


A 


Gl-A-FWD: TCA AGC AAA TTC 
TGG CCT TA 


35 cycles: 




935 






Gl-A-REV: GGT TTT TTG TTG GGT 
ATT CTT TTG C 


(94°Cx30"-58=Cx30"- 


72°Cxl') 






B 


G1-B-FWD: ACA AGC AAA TTT 
TGG CCC TA 


35 cycles: 




874 






G1-B-REV: CAG GGA ACG AAG 
TTG AAC AC 


{94°Cx30"-58=Cx30"- 


72°Cxr) 




G2 


A 


G2-A-FWD: GAA GTG TTC AAC 
TTT GTA CC 


35 cycles: 




755 






G2-A-REV: CTG CAA TTC TGT TAC 
AGC AT 


{94'Cx30"-55°Cx30"- 


72°Cxr) 






B 


G2-B-FWD: CAC ACC ACA CAA 
CAG CAC AA 


35 cycles: 




1015 






G2-B-REV: CCC AGA AAT CTT CGT 
TTC CTC 


{94°Cx30"-55 Cx30"- 


72 Cxi') 




F1 


A&B 


Fl-AB-FWD*: GGC AAA TAA CAA 
TGG AGT TG 


{94°Cx30"-48°Cx30"- 


72°Cxl')x5 cycles, then 


A: 1047 






Fl-AB-REV: AAG AAA GAT ACT 
GAT CCT G 


(94°Cx30"-55=Cx30"- 


72°Cxl')x35 cycles 


B: 1065 


F2 


A&B 


F2-AB-FWD*: TCA ATG ATA TGC 
CTA TAA CA 


(94°Cx30"-48°Cx30"- 


72°Cxl')x5 cycles, then 


A:1255 






F2-AB-REV: GGA CAT TAC AAA 
TAA TTA TGA C 


{94°Cx30"-55=Cx30"- 


72°Cxl')x35 cycles 


B:1262 



*: Mg: Primers: 
•: [47]. 

doi:l 0.1 371 /journal.pone.0090786.t001 
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Alignment Editor Version 7.0.9.0 [48]. SeqMan® program of 
Lasergene 8 program suite (DNASTAR, Inc, Madison, WI) was 
used for contigs assembling and obtaining the final SH-G-F gene 
sequence for each clinical isolate (~3.500 bp). The nomenclature 
adopted for our isolates included the RSV group (A or B), followed 
by the place of isolation (Texas, TX), laboratory isolate number 
and date of isolation (e.g., A-TX-79218-Nov 04). Finally, EditSeq 
of Lasergene 8 program suite was used to extract the coding 
sequences (CDS) of SH, G and F genes for further analysis. For the 
genotype analysis, a fragment of ~270 bp located in the second 
hypervariable region of the G gene was obtained for each strain 
[10]. 

Phylogenetic analysis 

The phylogenetic analyses were performed with R software (R 
Development Core Team, 2009) using recently developed 
phylogenetic tools (APE [49] and phangorn [50] packages). 
Multiple sequence alignments were performed by CLUSTAL W 
for the entire SH-G-F gene fragment and CLUSTAL X (version 
1.83) for CDS sequences [51]. Analyses included a likelihood 
based model selection seeded ^\ ith the UPGMA tree to identify the 
most appropriate nucleotide substitution model, followed by 
maximum likelihood analysis of phylogeny with evolutionary 
distances calculation. Bootstrap analyses were performed using the 
bootstrap.pml method in the phangorn package with 1,000 
replicates to evaluate the tree topology and group structures 
identified. Bootstrapping values >75% were considered signifi- 
cant. 

The genotype distribution analysis of the clinical isolates was 
performed based on the methodology described by Peret et al [10] 
that compared the second hypervariable region of the G gene. To 
improve the robustness of the analysis, gene sequences of HRSV 
were retrieved from Genbank as references for comparison with 
the Houston clinical isolates (Table SI). A total of 62 partial G 
gene sequences of HRSV-A (270 nt) and 53 of HRSV-B (270 to 
330 nt) were retrieved from Genbank. A phylogenetic analysis of 
SH, G and F genes (CDS) was performed in relation to genotype 
distribution of clinical isolates collected in Houston from 1987 to 
2005. 

Genetic variability analysis 

The variability within and between groups was examined for 
each site in each gene as well as cross tabulation of patterns of 
variation against functional regions that have been previously 
annotated. In the F gene there are 6 regions for which - to our 
knowledge - no function has been previously described. These 
regions in the F gene were assigned the term "not defined" and 
numbered sequentially from 1 to 6. At the nucleotide level, the 
measure of distributional divergence was determined by calculat- 
ing KuUback-Leibler (KL) divergence in each site by the 
comparison of nucleotide distributions between groups defined 
by the phylogenetic analysis (i.e. the colored groups in the trees 
shown in Figure 2). In order to obtain information about variation 
of the contemporary clinical strains, the comparisons were 
performed between genotypes GAl (prototype isolates) and others 
in the HRSV-A group and between strain B-CH- 18537-62 
(prototype strain) and others for HRSV-B strains. After examining 
the KL divergence values using kernel density plots of the values 
across all sites, KL values of 0.5 or more per nucleotide site were 
considered significant. The percentages of those divergent sites 
across the functional regions were calculated. 



Analysis of differences between HRSV viruses by 
genotypes and protein domains 

The variability within and between genotypes and genes was 
investigated by the analysis of the rate of non-synonymous (Dn) to 
synonymous (Ds) variation by domains to identify the domains of 
viruses which appeared divergent between genotypes. This 
analysis was performed by utilizing the multiple sequence 
alignments together with strain inferences. Translated amino acid 
sequences were determined from the alignments, and a compar- 
ison of DNA-level variation and amino acid variation was used for 
analysis. Sites were considered synonymously variant when they 
displayed nucleotide variation but not amino acid (aa) variation 
between a pair of sequences. We grouped sequences into HRSV 
genotypes according to the phylogenetic analysis, and we 
tabulated the total number of non-synonymous and synonymous 
changes between sequences in different genotypes using all 
pairwise combinations of sequences from different groups. 
Comparisons were performed between the sequences from groups 
GAl (prototype viruses), GA2 and GA5 for HRSV-A and between 
groups GB3, GB4 and BA (contemporary B-isolates) in the 
HRSV-B strains. Variation among viruses was recognized when 
the Dn/Ds (x) was greater than 1, while x<l was considered to be 
convergent and x = 1 was a neutral selection. Functional 
annotation of sequence regions was superimposed on these results 
to reveal pattern in our statistic that correlated the Dn/Ds 
tabulation with prior functional knowledge of protein regions. 

Nucleotide sequence accession numbers 

The sequences from Texas, USA reported in this study were 
deposited in GenBank database under accession numbers 
JX198105 to JX198169. 

Results 

Nucleotide sequencing of the SH, G, and F genes of the HRSV 
was performed in the 60 original isolates, covering a period from 
1987 to 2005. The number of original samples analyzed per 
epidemic period was: 1 for 1987, 3 for 1991-92, 18 for 1992-93, 5 
for 1993-94, 1 for 1994-95 and 32 for 2004-05. Prototype strains 
from years 1956 (A-Long, USA), 1961 (A2, Australia), 1961 (A- 
Bernett, USA), 1962 (B- 18537, USA) and 1987 (A-Tracy, TX) 
were also sequenced. In total, 35 HRSV-A strains and 30 HRSV- 
B were studied. 

The analyzed genes (SH-G-F) from HRSV-A samples corre- 
sponded to nucleotides 4,171 to 7,630 in the reference strain A2 
from Genbank (accession number U50362 [52]). For HRSV-B 
isolates, the segment corresponded to nuiieotide (nt) 4,144 to 
7,391 of the Bl strain in Genbank (accession number NC_001781 
[53]). The total length ranged from 3,454 to 3,460 nt for HRSV-A 
samples, and from 3,246 to 3,307 nt for HRSV-B group. The 60- 
nt duplication in the G gene previously described [24], and 
characteristic of Buenos Aires (BA) strains was detected in 7 
HRSV-B original samples from the epidemic season 2004—2005. 
Also a previously described six-nucleotide deletion [15] was 
detected in four of tiie BA strains (TX-B-79226-Nov04, TX-B- 
79233-Nov04, TX-B-79247-Nov04 and TX-B-79307-Jan()5). The 
nucleotide comparison hc-twec-n the HRSV-A strains and HRSV 
A-Long-1956, the oldest strain sequenced, showed percentages of 
identity of 93.2 to 95.2%. Only one pair of samples collected 
during the same epidemic season had 100% identity in the SH-G- 
F genes (TX-A-79230-Nov04 and TX-A-79310-Jan05). The 
identity between HRSV-B strains and reference HRSV B- 
18537-1962 was 95.1 to 96.5%. Three HRSV-B samples coUected 
during the same epidemic period had 100% identity in their SH- 
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B 



r—A-MON-9-Q2-G-FRAG 

—A-ALW471-5-G-FRAG 

—A-BA-l-ae-G-FRAG 

-A-NY108-G-FRAG 

-A-M048-G-FRAG 

-A-CH3<I-G-FRAG 

I A-BA-2-96-G-FRAG 

A-MON-9-91-G-FRAG 

, -W/-629-3-06-07-G-FR/1G 
Vl2-AUS-6)-G-FR/lG 

■A-Bernett-May 61-G-FRAG 



t-mict'6im°FRAG 

-A-USA-Long-56-G-FRAG 



GA1 

GA2 

GA3 

GA4 

GAS 

GA6 

GA7 

SAA1 

Local 



A-MON-1-90-G-FRAG 
-A-TX-79219-Nov 04-G-FRAG 
—A-Sal-173-99-G-FRAa 

' -TX-7922a-Nm 04-G-FRAG 
TX-79240-Nov 04-G-FRAG 
TX-79365-Feb 05-G-FRAG 
. \-TX-79216-Nov 04-G-FRAG 
A-TX-79230-No\/ 04-G-FRAG 
^A-TX-79310-Jan 05-G-FRAG 
.H-Ab5076R0f-G-FRAG 
jl-7X-79286-Dec 04-G-FRAG 
^A-TX-7924S-Nov 04-G-FRAG 



t-TX-792a5-Dec 04-G-FRAG 
, i-TX-79254-Dec 04-G-FRAG 
'A-TX-79257-Dec 04-G-FRAG 
-A-TX-79326-Jan 05-G-FRAG 
-A-Nyi03-G-FRAG 
-A4-TX-e6125-Mar94-G-FRAG 



H-TX-50437-Jan 91-G-FRAG 
H-TX-SOIOe-Jan 91-G-FRAG 
jA-JX-61245-Feb 93-G-FRAG 
r'—A4-TX-6B357-Dec 94-G-FRAG 

-A-MON-S-92-G-FRAG 

-A-CN270S-G-FRAG 

A-AU9556-3-G-FRAG 

—A-MO01-G-FRAG 



_A-TX-7921S-Nov 04-G-FRAG 
A-TX-79306-.ian 05-G-FRAG 




A-SA97L 

-A-BA-21 
A-CN2395-G-F, 
■A-MO02-G-r 
1-G-FRAG 
■-1-G-FRAQ 



■TX-79303-Jan 05-G-FRAG 



-A-ALI9452-2-G-FRAG 

-A-NY20-G-FRAG 
-A-MOM-4-90-G-FRAG 
-A-M016-G-FRAG 
-A1-TX-37425-Dec S7-G-FRAG 
^A-SA9SV603-9S-G-FRAG 

^ A-SA99Vt239-99-G-FRAG 

A-WI-629-DC9-0S-09-G-FRAG 
A-WI-629-22-07-G-FRAG 
-A-WI-629-O0282-10-G-FRAG 
-A-WI-629-21-07-G-FRAG 
-A-WI-629-9-2-07-G-FRAG 



,. WI-S29-Q0284-10-G-FRAG 
'A-WI-629-O0154-10-G-FRAG 
—A -BA -5046-00-G-FRAG 

A-Sa!-d7-99-G-FRAG 

A-BA-3793-99-G-FRAG 

A-BA-3144-98-G-FRAG 

—A-BE-12511-96-97-G-FRAG 
\_rA-AL1937e-1-G-FRAG 
^A-CH28-G-FRAG 
A-MON-3-88-G-FRAG 
A-M055-G-FRAG 

A-TX-79256-Dec 04-G-FRAG 

-A-Ab4026B01-G-FRAG 

A-TX-79303-Jan 05-G-FRAG 
-A-TX-79258-Dec 04-G-FRAG 
hA-BF-1W30-00-01-G-FRAG 
\-A-LLC235-2e7-G-FRAG 
I — A-TX-79299-Jan 05-G-FRAG 
'-A-TX-79309-Jan 05-G-FRAG 
-A-WI-629-4239-9S-G-FRAG 
A-JX-79334-Jan 05-G-FRAG 
-\j-A-TX-79312-Jan 05-G-FRAG 
yi-TX-79321-Jan 05-G-FRAG 
'A-TX-79223-Nov 04-G-FRAG 
A-BA-5948-01-G-FRAG 



GB1 

GB2 

GB3 

GB4 

SAB2 

SABS 

BA 

Local 



—B-TX-ei736-Mar 93-G-FRAG 

B-TX-65848-Feb 94-G-FRAG 

rB-TX-e0188-Dec 92-G-FRAG 
B2-TX-65449-Jan 94-G-FRAG 
3-TX-6 1 735-Mar 93-G-FRAG 
3-TX-60462-Jan 93-G-FRAG 
3-TX-6058e-Jan 93-e-FRAG 
3-TX-B1406-Feb 93-G-FRAG 
- 32-TX-60911-Jan 93-G-FRAG 
B3-TX-61077-Feb 93-G-FRAG 
B-CN1839-G-FRAG 
'—B-NYOI-G-FRAG 
_B-TX-61326-Feb 93-G-FRAG 
B-TX-6)699-Mar 93-G-FRAG 
-B-TX-ei512-Feb 93-G-FRAG 

B-BA-3976-99-G-FRAG 

-B-AL19734-4-G-FRAG 
-B-MO30-G-FRAG 

-B-BA-3018-98-G-FRAG 



I-BA-5954-01-G-FRAG 
L 



?-35J-ijJ-G-FR,AG 
-TX-79247-Nov 04-G-FRAG 



-B-TX-79307-Jan 05-G-FRAG 



-WI-629-5B-06-07-FRAG 



■J-FRAG 

FRAG 



B-TX-79362-Feb 05-G-FRAG 

-B-NG-006-03-G-FRAG 

B-BE-11508-02-G-FRAG 

B-OUE-1S-02-G-FRAG 

B-NG-C''-:-"-G--°AG 

G-; -=/>G 

i-FRAG 

I B-JX-79325-Jan 05-G-FRAG 

'B-BA-502I-03-G-FRAG 
—B-BE-12445-99-G-FRAG 

B-BE-106e-03-G-FRAG 

Li B-NG- 1 53-03-G-FRAG 

'-B-BE-12522-01-G-FRAG 
B-Ny97-G-FRAG 

" -:997-01-G-FRAG 

— B2-TX-65859-Feb 94-G-FRAG 
-B-CH93-53b-G-FRAG 
AL19794-1-G-FRAG 
3-CH93-18b-G-FRAG 
■B-M053-G-FRAG 



■J-S9-G-FR4G 
■9-G-FRAG 



770-02-G-FRAG 
B-BA-164-02-G-FRAG 
BA-3S33-99-G-FRAG 

::)-99-G-FRAG 

■.-M035-G-FRAG 

-S-S4- ■■-■-"^--^-FRAG 



I R-T 



^ I B-rX-79222-No^ 04-G-FRAG 

-B-TX-61500-Feb 93-G-FRAG 
-B-TX-ei501-Feb 93-G-FRAG 
-B-TX-50m-Jan 93-G-FRAG 
-B1-TX-60567-Jan 93-G-FRAG 
I-BA-4062-99-G-FRAG 

i-25 74-97-G-FRAG 
-B-BA-2980-98-G-FRAG 
-B-BA-3923-99-G-FRAG 
-B-BA-493-04-G-FRAG 
'-QUE-29-01 -G-FRAG 
'B-QUE-70-01-G-FRAG 
-B-MAD-1 -91 -G-FRAG 
B-CH93- 




Ir-( 



-B-TX-60823-Jar} 93-G-FRAG 
-B1-TX-64817-Dec 93-G-FRAG 



-B3-TX-57097-Dec 91-G-FRAG 

B-TX-61138-Feb 93-G-FRAG 

-B-BE-4618-88-G-FRAG 



I-WN-1S29I-S5-G-FRAG 

B-SE-296S-S5-G-FR/IG 

S-CH-f8537-62-G-FR/lG 



-0.03 



0.03 



Figure 2. Phylogenetic trees of HRSV strains group A (A) and B (B), based on tlie second hypervariabie region of G-gene as 
described by Peret et ai [10]. Selected worldwide previously described sequences (color), were retrieved from GenBank and compared to local 
strains from Houston, TX (black). Phylogenetic tree construction was performed by maximum likelihood analysis, and bootstrap values were 
calculated to support clustering. Only bootstrap values greater than 75% are shown. 
doi:1 0.1 371 /journal.pone.0090786.g002 



G-F genes (TX-B-60188-Dec92, TX-B-60462-Jan93 and TX-B3- 
61077-Feb93). 

Phylogenetic Trees and Genotype analysis 

To determine the genotype classification of our isolates, the 
phylogenetic analysis was performed by comparing the last 270 nt 
fragment of the G gene of the HRSV isolates to those with 
previous genotype designations [10-13,15,16,54,55]. The phylo- 
genetic trees obtained are shown in Figure 2. For HRSV-A strains 
(Figure 2A), two major branches were identified. The first one 
consisted of the previously described GAl genotype which 
included all the prototype strains: A-USA-Long-56, A-TX- 
Tracy-Oct87, A-USA-Bernett-May6 1 , and A2-AUS-61. A dis- 
tance is present between A-USA-Long-56 strain and the rest of the 



GAl isolates, supported by a bootstrap value of 100%. On the 
second major branch, where all the original TX samples are 
located except for A-TX-Tracy-Oct-87, bootstrapping values of 
100% support clusters that correspond to previously described 
genotypes: GA2, GA4, GA5 and GA7. A clear differentiation was 
not observed for genotypes GA3, GAG and SAAl which appeared 
to group together. The TX samples primarily grouped with the 
GA5 and GA2 genotypes. Twenty of the TX HRSV-A isolates 
clustered with GA5 viruses, corresponding to isolates collected 
during years 1991, 1993, 1994, 2004 and 2005. Nine other TX 
HRSV-A isolates collected during the epidemic 2004-2005 
clustered in the GA2 genotype. One TX HRSV-A isolate from 
the 2005 RSV season grouped with the GA7 genotype (A-TX- 
79303-Jan05), and another TX HRSV-A strain coUected in 1987 
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clustered close to the GA3-GA6-SAA1 group (A-TX-37425- 
Dec87). 

The distribution of HRSV-B strains is shown in Figure 2B. The 
prototype HRSV strain, B-CH- 18537-62, is located in a separate 
branch distant to all other strains. Two of the TX HRSV-B 
isolates collected in 1991 and 1993 grouped with thi; GBl 
genotype as supported by a bootstrap value of 100% (B3-TX- 
57097-Dec91 and B-TX-6I138-Feb93). Clusters of GB2 and 
SAB2 genotypes with 100% bootstrapping values were recognized 
but none of the TX HRSV-B isolates were detected in those 
genotypes. Thirteen HRSV-B TX isolates cxiUected during RSV 
epidemics 1992-93 and 1993-94 were identified in the GB4 
genotype cluster. The diflFerential grouping of the BA genotype 
strains from GB3 genotype viruses, as described by Trento et al 
[17,20], was not achieved. All HRSV-B samples collected during 
the 2()()4-0,5 epidemic (N = 7) clustered close to strains previously 
described as BA, but a bootstrap value greater than 75% was not 
acliic\cd. As expected, the nucleotide analysis of these 7 strains 
confirmed the presence of the 60-nt duplication in the second 
hypervariable region of G gene. Five HRSV-B TX viruses isolated 
in 1993 and 1994 grouped within the GB3 viruses. Finally, 2 
HRSV-B TX strains from 1993 formed an independent main 
lineage not related with other known genotypes (B-TX-60823- 
Jan93 and B-TX-64817-Dec93). 

Phylogenetic trees were next constructed for the coding 
sequences (nucleotide sequences between AUG and stop codon) 
of the SH, G and F genes (Figure 3) and colored according to their 
assigned genotype derived using the tree and group structure 
determined from analysis using only the distal hypervariable 
fragment of the G gene (Figure 2). The SH CDS sequences 
(Figure 3A) of TX HRS V-A produced a similar phylogenetic tree 
to the hypervariable fragment of the G gene CDS except for the 
GAl strains which were di\'ided into two distinct branches with 
RSV-A-Tracy-Oct 89 appearing as an outgroup. As shown in 
figure 3B, the G gene CDS of TX HRSV-A samples produced a 
similar tree to that obtained with the distal hypervariable fragment 
of G gene. Groups GA2 and GA5 were recognized and supported 
by 100% bootstrapping value, and the prototype strains clustered 
in the GAl group. A phylogenetic tree constructed with the F gene 
CDS sequences of TX HRSV-A (Figure 3C) was similar to the 
clustering produced with the G gene CDS. 

In the case of the TX HRSV-B isolates, the phylogenetic tree of 
SH gene CDS (Figure 3D) did not have the expected branches. 
The GBl isolates clustered together while the GB4, GB3 and BA 
strains were grouped in non-distinct branches, although the tree 
was poorly resolved. The phylogenetic tree of the G gene CDS 
(Figure 3E) of TX I IRS\'-B resulted in a well define clustering of 
genotypes GBl, GB3, GB4 and BA. As expected, BA strains 
clustered close to GB3, and HRSV B-CH- 18537-62 was located 
alone in a separate branch. The phylogenetic tree of the F gene 
CDS of TX HRSV-B (Figure 3F) had differences compared to the 
phylogenetic trees constructed with the G gene CDS (Figure 3E) 
and with the distal hypervariable domain (Figure 2B). The HRSV 
B-CH-18537-62 strain clustered with GBl genotype isolates with a 
100% bootstrapping value. The BA strains were more distant to 
GB3 isolates compared to the grouping observed with the G gene 
CDS and with the 270 nt fragment of the G gene as measured by 
the total number of changc-s. 

Nucleotide sequence analysis by genes and functional 
domains 

Multiple alignments showed different evolutionary distances for 
the SH, G and F genes for die TX HRSV-A and HRSV-B 
isolates. It confirmed that the G gene was the most variable. The 



mean of distances for the SH, G and F CDS of TX HRSV-A 
isolates were 0.024, 0.057 and 0.026 respectively (WUcoxon 
p<0.001) as measured in the mean number of differences per site. 
The mean of distances for the TX HRSV-B strains were 0.015, 
0.041 and 0.012 for the SH, G and F gene CDS, respectively 
(Wik:oxon p<0.001). 

SH gene. All the TX HRSV-A samples analyzed had SH 
CDS of 195 nt. For TX HRSV-B isolates the SH CDS was 198 nt 
in length. No didetions, duplications, insertions, or alternative stop 
codons were detected in these sequences. Nucleotide variability 
was assessed by domains and pair-wise comparison was made 
between the prototype viruses to the TX isolates (Table 2, 
Figure 4). The hydrophobic core region of TX HRSV-A isolates 
was the most variable (6/72 sites with KL divergence >0.5), 
followed by the extracellular domain (5/81) and the cytoplasmic 
domain (0/ 42). For TX HRSV-B isolates the extracellular domain 
showed the greatest variability (11/84), followed by the hydro- 
phobic core (4/72) and cytoplasmic domain (2/42). 

G gene. The length of the G gene of TX HRSV-A isolates 
ranged from 852 to 897 nucleotides. Two alternative stop codons 
caused by nucleotide substitutions in positions 850 and 892 were 
detected with predicted G proteins lengths of 283 (TX-A-79312- 
Jan05) and 297 amino acid (aa) residues (prototypes strains RSV 
A-Bemett-61 and A-Tracy-Oct87, plus TX-A-37425-Dec87 and 
A-TX-79256-Dec04). The rest of the isolates were 897 nt (298 aa) 
in length. Among TX HRSV-B isolates, different lengths in G 
CDS were detected ranging from 669 to 954 nt. As previously 
commented, a 6-nt in-frame-deletion was detected after position 
475 in four of the 7 BA isolates collected from epidemic season 
2004—2005. Also a 60 nt-duplication was detected after position 
780 in all seven BA isolates collected in the 2004—05 RSV season. 
In addition, three alternative stop codons were identified. First, a 
single deletion after nucleotide 584 and a 2 nt-insertion after 
position 59 1 resulted in the presence of a premature TGA codon 
after the frameshift. Thus, two isolates from 1992-1993 epidemic 
had predicted G proteins of 222 and 223 aa, respectively (TX-B- 
60911-Jan93 and TX-B-61326-Feb93). Prototype RSV-B-CH- 
18537-62, plus 3 other isolates from 2004 had a premature stop 
codon due to a substitution in position 877 and 823. The predicted 
lengths of these proteins were 292 aa for the prototype RSV-B 
virus and 310 aa for the other (TX-B-79226-Nov04, TX-B-79233- 
Nov04 and TX-B-79307-Jan05). The last alternative stop codon 
and the most frequent in our isolates led to protein lengths of 295 
and 315 aa. Only two of the TX HRSV-B strains used the 
originally described stop codon in the reference Bl strain 
(accession number NC_001781) resulting in 299 and 317 aa 
residues in length (TX-B-57097-Dec91 and TX-B-79247-Nov04). 

The analysis of nucleotide variability by domains is shown in 
Table 2 and Figure 4. As expected, the first and second 
hypervariable regions of the G gene have high nt variability 
(>10%) in both the TX HRSV-A and HRSV-B isolates. The 
pattern in variation of the domains was comparable between 
RSV-A and RSV-B groups, aldiough TX HRSV-B isolates also 
had high nt variation in the cytoplasmic tail domain. 

F gene. All the TX HRSV-A and HRSV-B isolates had an F 
CDS of 1725 nt in length. No insertions, deletions, duplications, or 
alternative stop codons were detected in the F CDS giving a 
predicted length of 574 aa for all isolates. Nucleotide variability 
according to domains is shown in Table 2 and Figure 4. Most of 
the domains in the F gene of TX HRSV-A and HRSV-B isolates 
have nt variability <10%. The exception were the signal peptide 
and transmembrane domains of the TX HRSV-A isolates with nt 
variability as high or higher than that detected in the hypervari- 
able domains of the G gene. Also the antigenic domains (I, II, IV 
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A) HRSVA: SH CDS 



B) HRSVA: G CDS 



C) HRSVA: F CDS 



GA1 
GA2 

GA3,C 
GA5 
GA7 



,GA6,SM1 



A-TX-Traoy-Oct 87 

A-TX-79256-Dec 04 
A-TX-79309-Jan 06 
A-TX-79223-Now 04 
A-TX-79312-Jan 05 
A-TX-79321-Jan 05 
A-TX-79334-Jan 05 
A-TX-79308-Jan 05 
A-TX-79258-Dec 04 

A-TX-79299-Jan 06 

— A-TX-7S303-Jan 05 



A-TX-37425-Dec 87 

A-TX-7924S-NOV 04 
A-TX-79285-Dec 04 
A-TX-79254-Dec 04 
A-TX-79267-Dec 04 
A-TX-79310-Jan 06 
A-TX-79286-Dec 04 
loci A-TX-79326-Jan 06 
A-TX-79230-Nou 04 
A-TX-50437-Jan 91 
A-TX-66125-Mar 94 
A-TX-50106-Jan 91 
A-TX-79240-Nov 04 
A-TX-79365-Feb 06 
A-TX-79228-Nov 04 
A-TX-79216-IMov 04 
A-TX-79219-Nov 04 
I I A-TX-6 1245- Feb 93 
I A-TX-68357-Dec 94 
_| A-TX-79306-Jan 05 
~l A-TX-79218-Nov 04 

I A-Bernett-May 61 

—\ A-USA-Long-56 



10C 101 



IOC 



Tut 



IOC 



1 A2-AUS-61 



0.03 



_|A-TX-79218-Nov 04 
~l-A-TX-79306-Jan 06 
■A-TX-79325-Jan 05 
|A-TX-79285-Dec 04 
jA-TX-79257-Dec 04 
lA-TX-79254-Dec 04 
■ A-TX-79219-Nov04 
.A-TX-79248-Nov 04 
• A-TX-79286-Dec 04 
iA-TX-79230-Nou 04 
lA-TX-79310-Jan 05 
■A-TX-79216-Nov04 
■A-TX-79365-Feb 05 
locJA-TX-79228-Nov 04 
A-TX-79240-Nou 04 
pA-TX-61245-Feb 93 
LA-TX-68367-Dec 94 
■A-TX-60106-Jan 91 
lA-TX-50437-Jan 91 
l-A-TX-66125-Mar94 
■A-TX-37426-Dec 87 

j-A-TX-79308-Jan 05 
'l-A-TX-79268-Dec 04 
- -A-TX-79309-Jan 05 
-A-TX-79299-Jan 05 
-A-TX-79256-Dec 04 
rA-TX-79334-Jan 06 
_LA-TX-79321-Jan 06 
|A-TX-79312-Jan 06 
l-A-TX-79223-Nov 04 

<-7g3D3-Jan 05 

A-USA-Long-56 

A-Bernett-May 61 

-A2-AUS-61 

-A-TX-Tracy-Oct 87 



A-TX-79256-Dec 04 
rA-TX-79334-Jan 05 
A-TX-79223-Nov 04 
A-TX-79321-Jan 05 
A-TX-79312-Jan 05 



' ^|-A-TX-79299-Jan 05 

llA-TX-79308-Jan 05 
l-A-TX-79258-Dec 04 
-A-TX-37425-Dec 87 
-A-TX-65125-Mar94 
10c(-A-TX-61245-Feb 93 
IOC j-A-TX-50106-Jan 91 
^'''lr-A-TX-50437-Jan 91 

A-TX-68357-Dec 94 
•A-TX-79228-Nov 04 
■A-TX-79216-Noy 04 
A-TX-79240-Nov 04 
■A-TX-79365-Feb 05 
|_|A-TX-79230-Noy 04 
r- ^A-TX-79310-JanC5 
|A-TX-79248-Nov 04 
lA-TX-79286-Dec 04 
— A-TX-79219-Nov04 



UK 



La 



-0.03 



A-TX-79306-Jan 05 
|-A-TX-79257-Dec 04 
lA-TX-79285-Dec 04 
Jl-A-TX-79326-Jan 05 
LA-TX-79254-Dec 04 

A-TX-79303-Jan 05 

—A-USA-Long-56 
-A-Bernett-May 61 
A2-AUS-61 

A-TX-Tracy-Ocl 87 

0.03 



D) HRSV B: SH CDS 



E) HRSV B: G CDS 



B-TX-61406-Feb 93 
B-TX-60586-Jan 93 

B-TX-61736-Mar93 

I B-TX-57097-Dec 91 

I B-TX-61138-Feb 93 

. B-CH-1 8637-62 

■ B-TX-79325-Jan 05 

■ B-TX-79362-Feb 05 

■ B-TX-60823-Jan 93 
I B-TX-64817-Dec93 

. B-TX-79307-Jan 05 



B-TX-61500-Feb 93 
B-TX-61501-Feb 93 
B-TX-60687-Jan 93 
B-TX-65869-Feb 94 

B-TX-79247-Nov 04 
B-TX-79233-Now 04 
B-TX-79226-Nov 04 
B-TX-79222-Nov 04 

B-TX-61699-Mar93 
B-TX-61326-Feb 93 
B-TX-65848-Feb 94 
B-TX-60188-Dec 92 
B-TX-60462-Jan 93 
B-TX-51612-Feb 93 
B-TX-61736-Mar93 
B-TX-60911-Jan 93 
B-TX-61077-Feb 93 
B-TX-65449-Jan 94 



- B-TX-60567-Jan 93 

■0.01 



■B-CH-1 8537-62 



h: 



B-TX-61326-Feb 93 
B-TX-61699-Marg3 
,— B-TX-65848-Feb 94 
B-TX-60462-Jan 93 
B-TX-60911-Jan 93 
B-TX-65449-Jan 94 
B-TX-61077-Feb93 
B-TX-60188-Dec92 
B-TX-60686-Jan 93 
B-TX-61406-Feb 93 
B-TX-61735-Mar93 
■B-TX-61736-Mar93 
B-TX-61512-Feb 93 
1- B-TX-60687-Jan 93 
J. B-TX-61501-Feb 93 
l-B-TX-61500-Feb 93 
— B-TX-65859-Feb 94 
B-TX-60567-Jan 93 

eB-TX-79362-Feb 05 
B-TX-79325-Jan 05 
B-TX-79222-Nov 04 
I- B-TX-79247-Nov 04 
B-TX-79233-Nov 04 
B-TX-79226-Nov 04 
B-TX-79307-Jan 06 



■B-TX-64817-Dec93 
- B-TX-60823-Jan 93 

B-TX-61 1 38-Feb 93 

B-TX-57097-Dec 91 



h: 



F) HRSV B: F CDS 



B-TX-61326-Feb 93 
B-TX-61 699-Mar 93 
B-TX-61 736-Mar 93 
B-TX-61512-Feb 93 
- B-TX-61406-Feb 93 

■ B-TX-65848-Feb 94 
B-TX-61 077-Feb 93 
B-TX-61 736-Mar 93 
B-TX-60462-Jan 93 
B-TX-50188-Dec92 

■ B-TX-60586-Jan 93 
. B-TX-65449-Jan 94 

<— B-TX-60911-Jan 93 
- B-TX-64817-Dec 93 

B-TX-60623-Jan 93 
■ B-TX-65859-Feb 94 
B-TX-61501-Feb 93 
B-TX-61 500-Feb 93 
B-TX-60687-Jan 93 
— B-TX-60567-Jan 93 

B-TX-79362-Feb 06 



^ B- TX-79325-Jan 05 



4 



B-TX-79222-Nov 04 
B-TX-79226-Nov 04 
B-TX-79233-Nou 04 
1- B-TX-79307-Jan 05 

— B-TX-79247-Nov 04 
-B-TX-57097-Dec91 

— B-TX-61 1 38-Feb 93 



-B-CH-1 8537-52 



0.01 



0.01 
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Figure 3. Phylogenetic trees of HRSV -A and -B, based on Coding Sequences (CDS) of SH, G and F genes. Phylogenetic tree construction 
was performed by maximum lil<elihood method and bootstrap values were calculated to support clustering. Only bootstrap values greater than 75% 
are shown. Colors of strains correspond to genotype classification shown in Figure 2. A), B) and C): Coding sequences of SH, G and F genes 
respectively, of HRSV-A strains. D), E) and F): Coding sequences of SH, G and F genes respectively, of HRSV-B strains. 
doi:10.1371/journal.pone.0090786.g003 



and 0) of the F gene have similar or lower nt variability compared 
to the central conserved domain of the G gene for both HRSV-A 
and HRSV-B viruses. 

Amino acid variation by genes and functional domains of 
HRSV strains 

The relative non-synonymous/synonymous substitution ratio 
(Dn/Ds) were calculated for each domain by the comparison 
between HRSV-A strains and HRSV-B strains (Table 3, Figure 5) 



using all pairwise comparisons of samples. For the pairwise 
comparison analysis, the HRSV-A prototype (GAl) viruses were 
included in addition to strains of contemporary genotypes. For the 
HRSV-B, the BA viruses (dominant genotype worldwide) and 
viruses from other contemporary genotypes were included. Ratio 
values >1 are considered to identify codons with variation. 33 
HRSV-A strains were analyzed; 4 GAl, 9 GA2 and 20 GA5. In 
total there were 296 pair-wise comparisons per codon. 25 HRSV- 
B strains were analyzed; 5 GB3, 13 GB4 and 7 BA. In total there 



Table 2. Nucleotide variability of the nnajor domains of the SH, G and F genes that encode the three surface proteins of HRSV of 
the Texas isolates compared to the prototype viruses. 







HRSV-A 






HRSV-B 








N of Divergent 
sites (KL>0.5) 


Nucleotides in 
length 


% of divergent 
sites per 
functional site 


N of Divergent 
sites (KL>0.5) 


Nucleotides In 
length 


% of divergent 
sites per 
functional site 


SH Gene 


Cytoplasmic domain 


0 


42 


0.00 


2 


42 


4.76 


Hydrophobic core 


6 


72 


8.33 


4 


72 


5.56 


Extracellular domain 


5 


81 


6.17 


11 


84 


13.10 


G gene 


Cytoplasmic tail 


9 


111 


8.11 


15 


111 


13.51 


Hydrophobic domain 


7 


84 


8.33 


6 


84 


7.14 


1^' hypervariable region 


44 


255 


17.25 


32 


255 


12.55 


Central conserved domain 


7 


120 


5.83 


11 


117 


9.40 


2"^^ hypervariable region 


57 


327 


17.43 


49 


327 


14.98 


F gene 


Signal peptide 


15 


66 


22.73 


6 


66 


9.09 


Not defined 1 


4 


90 


4.44 


2 


90 


2.22 


Heptad repeat domain 3 


10 


144 


6.94 


9 


144 


6.25 


Not defined 2 


8 


105 


7.62 


10 


105 


9.52 


Cleavage site 


1 


15 


6.67 


1 


15 


6.67 


Fusion peptide 


1 


27 


3.70 


2 


27 


7.41 


Heptad repeat domain 1 


9 


171 


5.26 


9 


171 


5.26 


Not defined 3 


9 


144 


6.25 


9 


144 


6.25 


Antigenic site II 


3 


63 


4.76 


1 


63 


1.59 


Not defined 4 


15 


312 


4.81 


14 


312 


4.49 


Antigenic site 1 


6 


63 


9.52 


0 


63 


0.00 


Not defined 5 


1 


63 


1.59 


2 


63 


3.17 


Antigenic site IV 


0 


51 


0.00 


1 


51 


1.96 


Not defined 6 


9 


105 


8.57 


3 


105 


2.86 


Heptad repeat domain 2 


10 


150 


6.67 


6 


150 


4.00 


Transmembrane domain 


9 


72 


12.50 


2 


72 


2.78 


Cytoplasmic tail 


4 


78 


5.13 


1 


78 


1.28 


Antigenic site 0a* 


2 


24 


8.33 


2 


24 


8.33 


Antigenic site 0b* 


2 


45 


6.66 


4 


45 


8.88 


* Not shown in Figures. Antigenic site 0 consists in two regions (called a and b in this table) structurally related in the pre-fusion state, and located within heptad repeat 



domain 3 and 1 respectively [59]. 
doi:1 0.1 371/journal.pone.0090786.t002 
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Figure 4. Nucleotide variability according to domains in the SH, G and F genes. KL distances were calculated for each position by the 
comparison between genotype GA1 to other HRSV-A strains and between reference strain B-CH-1 8537-62 to other HRSV-B isolates. KL values >0.5 
were considered significant. Percentages of divergent sites by domains are shown for HRSV-A and HRSV-B strains. 
doi:1 0.1 371 /journal.pone.0090786.g004 



were 191 pair-wise comparisons made per codon. Most of the nt 
variability resulted in synonymous aa changes in the HRSV-A and 
HRSV-B isolates. A few domains appeared to be enriched for sites 
under selection for variation. These selection for variation domains 
in the HRSV-A isolates were the first and second hypervariable 
domains of G protein, and the signal peptide and a "not-defined 
2" site of the F protein. The domains under selection for variation 
of the HRSV-B strains were the hydrophobic domain and the first 
and second hypervariable domains of the G protein as well as the 
"not-defined 2" site on the F protein. The majority of antigenic 
sites of the F protein for the HRSV-A and HRSV-B isolates were 
well conserved with Dn/Ds ratio ^0.07. Interestingly, for the 
recently described antigenic site 0 [56-59], which is a quaternary 
structure consisting of two regions related in the F trimer (called 
0a and 0b in this manuscript), and located within heptad repeat 
domains 3 and 1 respectively, higher variability was found. The 
antigenic region 0a of HRSV-A contemporary strains varied 
greatly from the HRSV-A prototype viruses with Dn/Ds ratios of 
infinity (all 167 aa changes identified in the clinical HRSV-A 
isolates were non-synonymous compared to the prototype GAl 
viruses). A neutral selection in the HRSV-B isolates with a Dn/Ds 
ratio of 0.9. 

Also observed were divergent trends in the Dn/Ds ratios in five 
of the SH, G and F domains between HRSV-A and HRSV-B 
strains. The domains were the hydrophobic core of the SH 
protein, the hydrophobic domain of the G protein, the signal 



peptide of the F protein, the "not-defined 2" site of the F protein 
and the transmembrane domain of the F protein. 

Discussion 

HRSV infections contribute to the worldwide burden of 
respiratory morbidity and mortality [1,2,4]. In recent years, viral 
genetic characteristics have been partially explored primarily for 
genotyping the strains that co-circulate during outbreaks [1 1-22]. 
The molecular epidemiology of HRSV is based on the phylogeny 
of the second hypervariable region of G gene [10]. To our 
knowledge, this is the first description of a genomic analysis of the 
three surface proteins of HRSV-A and HRSV-B isolates collected 
at a single site during an 18 year period. As expected from the 
literature, most of TX HRSV isolates clustered in weU-defined 
genotypes; GA2 and GA5 for the HRSV-A isolates and GB4 and 
BA for the HRSV-B isolates, based on the second hypervariable 
region of the G gene. Isolates belonging to the genotype of the 
prototype HRSV-A and HRSV-B viruses were infrequendy 
detected (3/60); none were identified after 1993 in our TX 
collection. Moreover, the prototype genotypes, clustered distant to 
most of the recent circulating strains. This is relevant since 
prototype HRSV-A viruses have traditionally been used in 
formulation of live and subunit candidate vaccines [60-62] . We 
consider that formulation of future candidate RSV vaccines and 
evaluation of new therapeutics will need to be cognizant of the 
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Figure 5. Amino acid variability according to domain in the SH, G and F genes. The non-synonymous/synonymous substitution ratio (Dn/ 
Ds) was calculated for each domain by the comparison between GA1, GA2 and GAS genotype strains for HRSV-A and between GB3, GB4 and BA 
HRSV-B isolates. Ratio values >1 are showing selection for variation between prototype (GA1) and contemporary viruses for HRSV-A and between 
dominant (BA) and contemporary viruses for HRSV-B. 
doi:10.1371/journal.pone.0090786.g005 



genetic variation in contemporary HRSV isolates compared to the 
prototype laboratory strains. 

The coding sequences of the SH, G and F genes were used to 
construct phylogenetic trees. We predicted that the phylogenetic 
tree constructed with the CDS of the G gene would generate 
clusters comparable to the genotypes that were generated with 
only the second hypervariable region of the G gene, while the 
phylogenetic trees constructed with the SH and F CDS might not 
cluster in patterns equivalent to their respective G-fragment based 
genotypes. Interestingly, the analyses revealed that the G and F 
CDS of HRSV-A and HRSV-B generated topologies similar to 
their genotypes. The phylogenetic tree constructed with the SH 
CDS of the HRSV-A isolates was consistent with its G-fragment 
based genotype topology, however, that was not true for the SH 
CDS of the HRSV-B isolates. This is an interesting observation 
that a 270 nucleotide sequence of the second hypervariable region 
of the G gene is predictive of the phylogenetic topologies that are 
formed with the full coding sequence of the G and F genes and to 
an extent with the SH gene. 

After the analysis of a 3,500 bp fragment, we detected 100% 
homology in one pair of HRSV-A strains and in three pairs of 
HRSV-B, different from previous reports when genotyping the 



270 bp fragment they found a higher frequency of identical strains 
[10,16,20,23,63,64]. Although genotyping the second hypervari- 
able region of the G gene has generated epidemiologicaUy valuable 
data, we must consider that genetic variation may also be 
occurring in different regions, especially with variants or genotypes 
that consistendy are associated with a greater virulence or survival 
advantages. Similar in concept to gene polymorphisms in the 
human genome, genetic variation in the genes encoding the 
proteins of HRSV is likely to have profound virological or 
biological effects on disease pathogenesis and might improve 
survival in the host by avoiding early recognition by the adaptive 
arm of the immune system. 

The molecular epidemiology of HRSV was developed around 
the G gene based on the largest antigenic and genetic differences 
between HRSV -A and -B groups [8] and because it is the target of 
many neutralizing and protective antibodies [26]. The hypervari- 
able domains of the G protein has a higher non-synonymous 
variation, indicative of a selection for variation by immune 
pressure, but no specific role has been attributed yet to these 
"mucin-like" conformations. Moreover, other functional regions 
of the G protein have been described as relevant in the 
pathogenesis of the infection. The F protein, a major protein 
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targeted by the host's immune system [65] is antigenically 
conserved in the post-fusion F for both HRSV-A and HRSV-B 
strains. Interestingly, the recently described antigenic site 0 in the 
pre-fusion F appeared to be variable, especially the contemporary 
HRSV-A strains from the prototype viruses. Monoclonal antibod- 
ies generated to antigenic site 0 appear to have greater 
neutralizing capacity compared to palivizumab, a monoclonal 
antibody that targets the highly conserved antigenic site II [58]. 
The high variability of antigenic site 0 detected in the clinical 
HRSV-A isolates raises caution to its potential as a vaccine target. 

Insufficient information is available on the SH protein. To 
further address the potential variability of the surface proteins, the 
nucleotide and amino acid CDS for the SH, G, and F genes were 
evaluated. According to our genetic variability analysis, it was 
confirmed that the G gene was the most variable, with the SH and 
F genes having lower and comparable means of distances for the 
HRSV-A strains and HRSV-B strains. 

Genetic variability of the Texas isolates was further explored by 
evaluating nucleotide divergence from the prototype viruses based 
on known functional sites or domains on the SH, G and F genes. 
As expected, the first and second hypervariable domains on the G 
gene of the TX strains in both HRSV groups had greater than 
10% of divergent sites compared to the prototype strains. 
Remarkably, the HRSV-B TX strains, the central conserved 
region demonstrated nucleotide divergence approaching 10% and 
the cytoplasmatic tail domain showed nucleotide variability higher 
than the first hypervariable domain. The central conserved region 
of the G protein is being evaluated for its potential as a vaccine 
candidate [66] and the cytoplasmatic tail has been studied for its 
possible role in eliciting the innate immune response [29] . On the 
F gene, the signal peptide and transmembrane domains of the TX 
isolates of the HRSV-A group demonstrated significant divergence 
compared to the prototype strains (more than 10% and 15% 
respectively). Antigenic site II of the F protein is the antigenic 
target of palivizumab, a highly effective humanized monoclonal 
antibody used in the immunoprophylaxis of high risk infants [5] . 
Nucleotide sequences of the antigenic site II of the TX strains 
appeared well conserved and comparable to the respective 
prototype HRSV-A and HRSV-B viruses. The nucleotide 
sequences of antigenic site I of the HRSV-A strains appeared 
more divergent compared to the other antigenic domains of the F 
gene. To our kno\vledge, there were 6 domains ("Not-defined" 1 
to 6) on the F gene for which no function or antigenic activity have 
been desc:ribed. . Interestingly, Not-defined 2 domain on the 
HRSV-A and HRSV-B isolates and Not-define 6 domain in the 
HRSV-A strains had nucleotide divergence between 5 to 10%. 
Their significance is unknown at this time. Heptad repeat domain 
1 and heptad repeat domain 2 are target sites for fusion inhibitor 
compounds [67]. Both of these heptad repeat domains had 
nucleotide divergence of approximately 5% for the HRSV-A and 
HRSV-B TX isolates compared to the prototype strains. Finally, 
in the case of SH gene, the extracellular domain and hydrophobic 
core demonstrated the higher frequencies of divergent sites. This 
study confirms that nucleotide variability is not uniform across the 
different functional domains of the three surface proteins of 
HRSV. Moreover, some domains with unknown function are 
more conserved than others. Our method of performing the 
selection for variation analysis was based on a direct assessment of 
non-synonymous and synonymous events per codon within each 
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